Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language
نویسندگان
چکیده
Human lip-reading is a challenging task. It requires not only knowledge of underlying language but also visual clues to predict spoken words. Experts need certain level of experience and understanding of visual expressions learning to decode spoken words. Now-a-days, with the help of deep learning it is possible to translate lip sequences into meaningful words. The speech recognition in the noisy environments can be increased with the visual information [1]. To demonstrate this, in this project, we have tried to train two different deep-learning models for lip-reading: first one for video sequences using spatiotemporal convolution neural network, Bi-gated recurrent neural network and Connectionist Temporal Classification Loss, and second for audio that inputs the MFCC features to a layer of LSTM cells and output the sequence. We have also collected a small audio-visual dataset to train and test our model. Our target is to integrate our both models to improve the speech recognition in the noisy environment.
منابع مشابه
Towards Next-Generation Lip-Reading Driven Hearing-Aids: A preliminary Prototype Demo
Speech enhancement aims to enhance the perceived speech quality and intelligibility in the presence of noise. Classical speech enhancement methods are mainly based on audio only processing which often perform poorly in adverse conditions, where overwhelming noise is present. This paper presents an interactive prototype demo, as part of a disruptive cognitivelyinspired multimodal hearing-aid bei...
متن کاملComparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts
: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...
متن کاملThe Effect of Concept Map on Improving English Reading Comprehension Skill of Medical Students’ in General English Language Course for Deep and Permanent Learning
Introduction: This study aimed to investigate the effect of concept map technique for strengthening reading comprehension skill of medical students' in general English language course for deep and permanent learning at North Khorasan University of Medical Sciences. Methods: The current study is a quasi-experimental design with pre-test and post-test and treatment. The population of this study ...
متن کاملImproving lip-reading performance for robust audiovisual speech recognition using DNNs
This paper presents preliminary experiments using the Kaldi toolkit [1] to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kal...
متن کاملnature of information literacy in elementary schools Case study of Persian literature in fourth grade
Background and Aim: Information literacy is a contextual concept that needs to be studied in different contexts like schools. Promoting reading literacy is a core instructional objectives of Persian literature curriculum and also a part of information literacy. Understanding Concept of information literacy helps us to understand information literacy in elementary schools and can implement it in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.05521 شماره
صفحات -
تاریخ انتشار 2018